Database systems
• Database: a collection of shared data objects (d1, d2,
… dn) that can be accessed by users
– every database has some correctness constraints defined on it (called consistency assertions or integrity constraint) – a database is said to be consistent if the values of its data satisfy these constraints
– nomenclature:
Transactions
• Transactions: set of actions on a database that are grouped in a single logical unit of interaction
• query: read-only transaction • update: transaction modifies at least one object
– assumptions:
• transactions preserve consistency • transactions terminate in finite time
• A user interacts with a database through complex operations called transactions
– a transaction consists of a sequence of read, write, compute statements that refers to data objects in the database – examples: on-line booking, bank teller operations, ...
– ACID properties
• • • • Atomic (all or nothing) Consistent (satisfy consistent rules) Isolated (not interfere with each other) Durable (committed transaction will not be lost)
The Transaction Model (1)
• Old method of updating a master tape is fault tolerant.
– Contrast with modern online database that is updated in place
The Transaction Model (2)
• The atomicity of transactions can be re-created with special primitives.
Primitive BEGIN_TRANSACTION END_TRANSACTION ABORT_TRANSACTION READ WRITE Description Make the start of a transaction Terminate the transaction and try to commit Kill the transaction and restore the old values Read data from a file, a table, or otherwise Write data to a file, a table, or otherwise
1
The Transaction Model (3)
•
Nested and Distributed Transactions
Difference between nested and distributed transaction
– a) A nested transaction occurs on logically and physically separate databases (independent sub-transactions) – b) A distributed transaction takes place on a logically single but physically distributed database
• Example
– a) Transaction to reserve three flights commits – b) Transaction aborts when third flight is unavailable
BEGIN_TRANSACTION reserve CMH -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi; END_TRANSACTION (a) BEGIN_TRANSACTION reserve CMH -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full => ABORT_TRANSACTION
(b)
Implementation #1: Private Workspace
a) b) c) The file index and disk blocks for a three-block file The situation after a transaction has modified block 0 and appended block 3 After committing
Implementation #2:Writeahead Log
• • a) A transaction b) – d) The log before each statement is executed
x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y=y+2 x = y * y; END_TRANSACTION; (a)
Log
Log
Log
[x = 0 / 1]
[x = 0 / 1] [y = 0/2]
[x = 0 / 1] [y = 0/2] [x = 1/4] (d)
(b)
(c)
2
Concurrency control (1)
• Concurrency control is the set of mechanisms put in place to preserve consistency and isolation
– If we were to execute only one transaction at a time the (i.e. sequentially) implementation of consistency and isolation would be trivially solved – However most of the time we are interested in providing concurrency – allowing several transactions to be executed simultaneously
Concurrency Control (2)
• Concurrency control is best understood in terms of three managers
– Data manager – Scheduler – Transaction manager
• The implementation of distributed transactions further complicates concurrency control because it requires handling multiple databases
• The three managers implement a division of tasks
– concurrency control is responsibility of the scheduler
Concurrency Control (3)
• The managers scheme can be extended to the distributed scheme
Concurrency Control Theory
• We want to be able to execute several transactions simultaneously
– serial execution preserves consistency (because each transaction individually preserves consistency) but is inefficient – what we really want is concurrent (i.e. interleaved) execution
• What we need is a method that enables us to answer the following questions:
– Given the log of a certain execution of a set of transaction, is that execution legal? (i.e. preserves consistency and isolation)
3
Conflicts
• Two transactions T1 and T2 are said to conflict if
– they access the same data objects, and – at least one access is a write, i.e. r-w, w-r, or w-w conflict
Logs
• Log: chronological order of actions performed by transactions under a concurrency control algorithm
– a log over T describes an interleaved execution of T0, T1, …, Tn
– Example:
• Note that if two transactions have no conflict, then their execution can be interleaved without restrictions
T1 = r1[x] r1[z] w1[x] T2 = r2[y] r2[z] w2[y] T3 = w3[x] r3[y] w3[z]
L1 w3[x] r1[x] r3[y] r2[y] w3[z] r2[z] r1[z] w2[y] w1[x]
L2 w3[x] r3[y] w3[z] r2[y] r2[z] w2[y] r1[x] r1[z] w1[x]
Equivalent logs
• Two logs are equivalent if
– all the transactions in both logs see the same state of the database, – they leave the database in the same final state
Serial Logs
• Serial log:
– a log in which all actions of a transaction terminate before any action of the next transaction starts – example: L2 is a serial log
• Two logs over a transaction set are clearly equivalent iff:
– every read operation reads from (i.e. sees the value written by) the same write in both logs, and – both logs have the same final writes
• Serializable log:
– a log in which actions from several transactions T0, T1, …, Tn are interleaved, and that has the same output and the same effect on the database as the serial execution of a permutation of T0, T1, …, Tn – Example: log L1 is equivalent to serial log L2
• Example: L1 and L2 in previous example are equivalent
• A serializable log is equivalent to a serial log, thus represents a correct execution
– question: is there some criterion for telling if a log is serializable?
4
Serialization Graph
• Let L be a log over a set of transactions T0, T1, …, Tn
– the serialization graph SG(L) of L is a directed graph in which the nodes are the transactions Ti, and whose edges satisfy the condition:
• edge Ti → Tj then for some x either ri[x] < wj[x] or wi[x] < rj[x] or wi[x] < wj[x]
The Serializability Theorem
• Th.: A log L is serializable iff SG(L) is acyclic
– no cycles in AG(L) ⇔ L is serializable
• The serial log corresponding to L can be determined by topologically sorting AG(L)
– example:
T3 T2 T1
– example:
T1 T3 T2
Serialization graph of L1
T3 → T1
→ T2
OR T3 → T2
→ T1
5